Search CORE

27 research outputs found

Adaptive Bayesian Quantum Tomography

Author: C. Vondrick
F. Huszár
N. M. T. Houlsby
Publication venue: 'American Physical Society (APS)'
Publication date: 05/10/2011
Field of study

In this letter we revisit the problem of optimal design of quantum tomographic experiments. In contrast to previous approaches where an optimal set of measurements is decided in advance of the experiment, we allow for measurements to be adaptively and efficiently re-optimised depending on data collected so far. We develop an adaptive statistical framework based on Bayesian inference and Shannon's information, and demonstrate a ten-fold reduction in the total number of measurements required as compared to non-adaptive methods, including mutually unbiased bases.Comment: 4 pages, 3 figures, updated references, clarified expositio

arXiv.org e-Print Archive

Crossref

Memory-augmented Dense Predictive Coding for Video Representation Learning

Author: C Vondrick
C Zach
I Misra
M Noroozi
R Arandjelović
R Zhang
S Hochreiter
U Büchler
Publication venue
Publication date: 01/01/2020
Field of study

The objective of this paper is self-supervised learning from video, in particular for representations for action recognition. We make the following contributions: (i) We propose a new architecture and learning framework Memory-augmented Dense Predictive Coding (MemDPC) for the task. It is trained with a predictive attention mechanism over the set of compressed memories, such that any future states can always be constructed by a convex combination of the condense representations, allowing to make multiple hypotheses efficiently. (ii) We investigate visual-only self-supervised video representation learning from RGB frames, or from unsupervised optical flow, or both. (iii) We thoroughly evaluate the quality of learnt representation on four different downstream tasks: action recognition, video retrieval, learning with scarce annotations, and unintentional action classification. In all cases, we demonstrate state-of-the-art or comparable performance over other approaches with orders of magnitude fewer training data.Comment: ECCV2020, Spotligh

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Long-Term Visual Object Tracking Benchmark

Author: AW Smeulders
B Babenko
C Vondrick
D Held
H Grabner
H Li
J Zhang
Jack Valmadre
JF Henriques
JF Henriques
M Danelljan
M Kristan
M Kumar
M Mueller
P Liang
WL Lu
Y Hua
Y Li
Y Wu
Z Kalal
Publication venue
Publication date: 01/01/2019
Field of study

We propose a new long video dataset (called Track Long and Prosper - TLP) and benchmark for single object tracking. The dataset consists of 50 HD videos from real world scenarios, encompassing a duration of over 400 minutes (676K frames), making it more than 20 folds larger in average duration per sequence and more than 8 folds larger in terms of total covered duration, as compared to existing generic datasets for visual tracking. The proposed dataset paves a way to suitably assess long term tracking performance and train better deep learning architectures (avoiding/reducing augmentation, which may not reflect real world behaviour). We benchmark the dataset on 17 state of the art trackers and rank them according to tracking accuracy and run time speeds. We further present thorough qualitative and quantitative evaluation highlighting the importance of long term aspect of tracking. Our most interesting observations are (a) existing short sequence benchmarks fail to bring out the inherent differences in tracking algorithms which widen up while tracking on long sequences and (b) the accuracy of trackers abruptly drops on challenging long sequences, suggesting the potential need of research efforts in the direction of long-term tracking.Comment: ACCV 2018 (Oral

arXiv.org e-Print Archive

Crossref

Minimally Needed Evidence for Complex Event Recognition in Unconstrained Videos

Author: C. G.
Cao L.
Hoai M.
Jiang Y.-G.
Muja M.
Natarajan P.
Vondrick C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

This paper addresses the fundamental question – How do humans recognize complex events in videos? Normally, humans view videos in a sequential manner. We hypothesize that humans can make high-level inference such as an event is present or not in a video, by looking at a very small number of frames not necessarily in a linear order. We attempt to verify this cognitive capability of humans and to discover the Minimally Needed Evidence (MNE) for each event. To this end, we introduce an online game based event quiz facilitat-ing selection of minimal evidence required by humans to judge the presence or absence of a complex event in an open source video. Each video is divided into a set of temporally coherent microshots (1.5 secs in length) which are revealed only on player request. The player’s task is to identify the positive and negative occurrences of the given target event with minimal number of requests to reveal evidence. Incentives are given to players for correct identification with the minimal number of requests. Our extensive human study using the game quiz validates our hypothesis- 55 % of videos need only one microshot for correct human judgment and events of varying complexity require differ-ent amounts of evidence for human judgment. In addition, the pro-posed notion of MNE enables us to select discriminative features, drastically improving speed and accuracy of a video retrieval sys-tem

CiteSeerX

Crossref

ImageNet Large Scale Visual Recognition Challenge

Author: A Geiger
A Torralba
Aditya Khosla
Alexander C. Berg
Andrej Karpathy
B Alexe
B Yao
C Liu
C Vondrick
DG Lowe
GA Miller
Hao Su
J Uijlings
Jia Deng
Jonathan Krause
K Crammer
KEA Sande van de
KEA Sande van de
Li Fei-Fei
M Everingham
M Everingham
Michael Bernstein
Olga Russakovsky
P Arbelaez
P Felzenszwalb
S Thorpe
Sanjeev Satheesh
Sean Ma
T Ahonen
Zhiheng Huang
Publication venue
Publication date: 01/01/2015
Field of study

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL VOC (per-category comparisons in Table 3, distribution of localization difficulty in Fig 16), a list of queries used for obtaining object detection images (Appendix C), and some additional reference

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Carolina Digital Repository

ClimSim: A large multi-scale dataset for hybrid physics-ML climate emulation

Author: Abernathey Ryan P.
Ahmed Fiaz
Anandkumar Anima
Bader David C.
Baldi Pierre
Barnes Elizabeth
Behrens Gunnar
Beucler Tom
Bhouri Mohamed Aziz
Brenowitz Noah D.
Bretherton Christopher
Busecke Julius
Caldwell Peter
Chuang Wayne
Eyring Veronika
Ferretti Savannah L.
Geneva Nicholas
Gentine Pierre
Gupta Ritwik
Han Yilun
Hannah Walter
Harrop Bryce E.
Hillman Benjamin R.
Huang Yu
Iglesias-Suarez Fernando
Jantre Sanket
Jenney Andrea M.
Kashinath Karthik
Khairoutdinov Marat
Kurth Thorsten
Lin Jerry
Liu Nana
Loose Nora
Lutsko Nicholas
Lütjens Björn
Ma Po-Lun
Mandt Stephan
Mooers Griffin
Neelin J. D.
Pathak Jaideep
Peng Liran
Pritchard Michael
Randall David
Shamekh Sara
Stern Charles I.
Subramaniam Akshay
Taylor Mark A.
Urban Nathan M.
Vondrick Carl
Will Justus Christopher
Yu Rose
Yu Sungduk
Yuval Janni
Zanna Laure
Zhang Guang
Zheng Tian
Publication venue
Publication date: 01/01/2023
Field of study

Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise predictions of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state.The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code (https://leap-stc.github.io/ClimSim) are released openly to support the development of hybrid ML-physics and high-fidelity climate simulations for the benefit of science and society

Institute of Transport Research:Publications

Do We Need More Training Data or Better Models for Object Detection?

Author: Carl Vondrick
Charless C. Fowlkes
Deva Ramanan
Xiangxin Zhu
Publication venue
Publication date: 01/01/2012
Field of study

(Work performed while at UC Irvine) Datasets for training object recognition systems are steadily growing in size. This paper investigates the question of whether existing detectors will continue to improve as data grows, or if models are close to saturating due to limited model complexity and the Bayes risk associated with the feature spaces in which they operate. We focus on the popular paradigm of scanning-window templates defined on oriented gradient features, trained with discriminative classifiers. We investigate the performance of mixtures of templates as a function of the number of templates (complexity) and the amount of training data. We find that additional data does help, but only with correct regularization and treatment of noisy examples or “outliers ” in the training data. Surprisingly, the performance of problem domain-agnostic mixture models appears to saturate quickly (∼10 templates and ∼100 positive training examples per template). However, compositional mixtures (implemented via composed parts) give much better performance because they share parameters among templates, and can synthesize new templates not encountered during training. This suggests there is still room to improve performance with linear classifiers and the existing feature space by improved representations and learning algorithms.

CiteSeerX

Crossref

Speeding up inference on deep neural networks for object detection by performing partial convolution

Author: AR Pathak
C Vondrick
Wattanapong Kurdthongmee
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref